17 research outputs found
Bayesian Additive Regression Trees With Parametric Models of Heteroskedasticity
We incorporate heteroskedasticity into Bayesian Additive Regression Trees
(BART) by modeling the log of the error variance parameter as a linear function
of prespecified covariates. Under this scheme, the Gibbs sampling procedure for
the original sum-of- trees model is easily modified, and the parameters for the
variance model are updated via a Metropolis-Hastings step. We demonstrate the
promise of our approach by providing more appropriate posterior predictive
intervals than homoskedastic BART in heteroskedastic settings and demonstrating
the model's resistance to overfitting. Our implementation will be offered in an
upcoming release of the R package bartMachine.Comment: 20 pages, 5 figure
Extensions and Applications of Ensemble-of-trees Methods in Machine Learning
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits.
These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity.
The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems
Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation
This article presents Individual Conditional Expectation (ICE) plots, a tool
for visualizing the model estimated by any supervised learning algorithm.
Classical partial dependence plots (PDPs) help visualize the average partial
relationship between the predicted response and one or more features. In the
presence of substantial interaction effects, the partial response relationship
can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate
the complexity of the modeled relationship. Accordingly, ICE plots refine the
partial dependence plot by graphing the functional relationship between the
predicted response and the feature for individual observations. Specifically,
ICE plots highlight the variation in the fitted values across the range of a
covariate, suggesting where and to what extent heterogeneities might exist. In
addition to providing a plotting suite for exploratory analysis, we include a
visual test for additive structure in the data generating model. Through
simulated examples and real data sets, we demonstrate how ICE plots can shed
light on estimated models in ways PDPs cannot. Procedures outlined are
available in the R package ICEbox.Comment: 22 pages, 14 figures, 2 algorithm
bartMachine: Machine Learning with Bayesian Additive Regression Trees
We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data
Variable selection for BART: An application to gene regulation
We consider the task of discovering gene regulatory networks, which are
defined as sets of genes and the corresponding transcription factors which
regulate their expression levels. This can be viewed as a variable selection
problem, potentially with high dimensionality. Variable selection is especially
challenging in high-dimensional settings, where it is difficult to detect
subtle individual effects and interactions between predictors. Bayesian
Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-298] provides a
novel nonparametric alternative to parametric regression approaches, such as
the lasso or stepwise regression, especially when the number of relevant
predictors is sparse relative to the total number of available predictors and
the fundamental relationships are nonlinear. We develop a principled
permutation-based inferential approach for determining when the effect of a
selected predictor is likely to be real. Going further, we adapt the BART
procedure to incorporate informed prior information about variable importance.
We present simulations demonstrating that our method compares favorably to
existing parametric and nonparametric procedures in a variety of data settings.
To demonstrate the potential of our approach in a biological context, we apply
it to the task of inferring the gene regulatory network in yeast (Saccharomyces
cerevisiae). We find that our BART-based procedure is best able to recover the
subset of covariates with the largest signal compared to other variable
selection methods. The methods developed in this work are readily available in
the R package bartMachine.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS755 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recommended from our members
Hunting and mountain sheep: Do current harvest practices affect horn growth?
The influence of human harvest on evolution of secondary sexual characteristics has implications for sustainable management of wildlife populations. The phenotypic consequences of selectively removing males with large horns or antlers from ungulate populations have been a topic of heightened concern in recent years. Harvest can affect size of horn-like structures in two ways: (a) shifting age structure toward younger age classes, which can reduce the mean size of horn-like structures, or (b) selecting against genes that produce large, fast-growing males. We evaluated effects of age, climatic and forage conditions, and metrics of harvest on horn size and growth of mountain sheep (Ovis canadensis ssp.) in 72 hunt areas across North America from 1981 to 2016. In 50% of hunt areas, changes in mean horn size during the study period were related to changes in age structure of harvested sheep. Environmental conditions explained directional changes in horn growth in 28% of hunt areas, 7% of which did not exhibit change before accounting for effects of the environment. After accounting for age and environment, horn size of mountain sheep was stable or increasing in the majority (similar to 78%) of hunt areas. Age-specific horn size declined in 44% of hunt areas where harvest was regulated solely by morphological criteria, which supports the notion that harvest practices that are simultaneously selective and intensive might lead to changes in horn growth. Nevertheless, phenotypic consequences are not a foregone conclusion in the face of selective harvest; over half of the hunt areas with highly selective and intensive harvest did not exhibit age-specific declines in horn size. Our results demonstrate that while harvest regimes are an important consideration, horn growth of harvested male mountain sheep has remained largely stable, indicating that changes in horn growth patterns are an unlikely consequence of harvest across most of North America.Utah Division of Wildlife Resources; National Wild Sheep Foundation (WSF); Wyoming Wild Sheep Foundation; Alberta Wild Sheep Foundation; California Wild Sheep Foundation; Arizona Desert Bighorn Sheep Society; Wyoming Governor's Big Game License Coalition; Iowa Foundation for North American Wild Sheep; Utah Foundation for North American Wild Sheep; Pope and Young ClubOpen access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Hunting and mountain sheep: do current harvest practices affect horn growth?
The influence of human harvest on evolution of secondary sexual characteristics has implications for sustainable management of wildlife populations. The phenotypic consequences of selectively removing males with large horns or antlers from ungulate populations has been a topic of heightened concern in recent years. Harvest can affect size of horn‐like structures in two ways: 1) shifting age structure toward younger age classes, which can reduce the mean size of horn‐like structures; or 2) selecting against genes that produce large, fast‐growing males. We evaluated effects of age, climatic and forage conditions, and metrics of harvest on horn size and growth of mountain sheep (Ovis canadensis ssp.) in 72 hunt areas across North America from 1981 to 2016. In 50% of hunt areas, changes in mean horn size during the study period were related to changes in age structure of harvested sheep. Environmental conditions explained directional changes in horn growth in 28% of hunt areas, 7% of which did not exhibit change before accounting for effects of the environment. After accounting for age and environment, horn size of mountain sheep was stable or increasing in the majority (~78%) of hunt areas. Age‐specific horn size declined in 44% of hunt areas where harvest was regulated solely by morphological criteria, which supports the notion that harvest practices that are simultaneously selective and intensive might lead to changes in horn growth. Nevertheless, phenotypic consequences are not a foregone conclusion in the face of selective harvest; over half of the hunt areas with highly selective and intensive harvest did not exhibit age‐specific declines in horn size. Our results demonstrate that while harvest regimes are an important consideration, horn growth of harvested male mountain sheep has remained largely stable, indicating that changes in horn growth patterns are an unlikely consequence of harvest across most of North America
Effects of Wolves on Elk and Cattle Behaviors: Implications for Livestock Production and Wolf Conservation
BACKGROUND: In many areas, livestock are grazed within wolf (Canis lupus) range. Predation and harassment of livestock by wolves creates conflict and is a significant challenge for wolf conservation. Wild prey, such as elk (Cervus elaphus), perform anti-predator behaviors. Artificial selection of cattle (Bos taurus) might have resulted in attenuation or absence of anti-predator responses, or in erratic and inconsistent responses. Regardless, such responses might have implications on stress and fitness. METHODOLOGY/PRINCIPAL FINDINGS: We compared elk and cattle anti-predator responses to wolves in southwest Alberta, Canada within home ranges and livestock pastures, respectively. We deployed satellite- and GPS-telemetry collars on wolves, elk, and cattle (n = 16, 10 and 78, respectively) and measured seven prey response variables during periods of wolf presence and absence (speed, path sinuosity, time spent head-up, distance to neighboring animals, terrain ruggedness, slope and distance to forest). During independent periods of wolf presence (n = 72), individual elk increased path sinuosity (Z = -2.720, P = 0.007) and used more rugged terrain (Z = -2.856, P = 0.004) and steeper slopes (Z = -3.065, P = 0.002). For cattle, individual as well as group behavioral analyses were feasible and these indicated increased path sinuosity (Z = -2.720, P = 0.007) and decreased distance to neighbors (Z = -2.551, P = 0.011). In addition, cattle groups showed a number of behavioral changes concomitant to wolf visits, with variable direction in changes. CONCLUSIONS/SIGNIFICANCE: Our results suggest both elk and cattle modify their behavior in relation to wolf presence, with potential energetic costs. Our study does not allow evaluating the efficacy of anti-predator behaviors, but indicates that artificial selection did not result in their absence in cattle. The costs of wolf predation on livestock are often compensated considering just the market value of the animal killed. However, society might consider refunding some additional costs (e.g., weight loss and reduced reproduction) that might be associated with the changes in cattle behaviors that we documented